Day 17 - Regular expressions - Groups

91

I decompose the rest of the line using the year ([0-9]{4}:) as an anchor. Mentioning the closing

square bracket ] outside the group makes it disappear from the output. The HTTP protocol can be

either HTTP/1.0 or HTTP/1.1, and the HTTP status is always a three digits number ([0-9]{3}).

Go back to the exercise

Exercise 17.02

The file simple.log contains lines with requests concerning files like

83.149.9.216 [17/May/2015:10:05:03 GET /presentations/logstash-monitorama-2013/image\

s/kibana-search.png HTTP/1.1 200 203023 http://semicomplete.com/presentations/logsta\

sh-monitorama-2013/

Extract a list of all file extensions and count them. Assume that extensions are made of lowercase

letters only.

Solution

There are many ways to solve this exercise. One possible solution is to use lookaround expressions

with grep to isolate the file path and later the file extension.

$ cat simple.log | grep -Po "(?<=GET )\/.*(?= HTTP)" | grep -Po "(?<=\.)[a-z]+$" | s\

ort | uniq -c

4 access

10 c

14 conf

1 cpp

1458 css

4 deb

[...]

Go back to the exercise

Exercise 17.03

There are three lines in the file simple.log where a request received an HTTP 500 status code, for

example